智能论文笔记

Some Strategies to Capture Karaka-Yogyata with Special Reference to apadana

Swaraja Salaskar , Diptesh Kanojia , Malhar Kulkarni

分类：自然语言处理

2022-01-05

在今天的数字世界语言技术中取得了重要性。已经开发了几种软件，并提供了计算语言学领域。此类工具在使古典语言文本容易访问的情况下起着至关重要的作用。一些印度哲学学校为言语认知的各种技巧做出了贡献，可以正确分析句子。这些理论可用于构建单词感应消歧（WSD）的计算工具。在没有WSD的情况下，人们不能有适当的口头认知。这些理论被认为是“日惹\ = a”（发光性或兼容性）的概念，作为口头认知的不可或缺的原因。在这项工作中，我们在这些理论的基础上提出了一些洞察力，以创建一个工具，该工具将捕获yogyat \ = a的单词。我们在文本中描述了模糊性的问题，并呈现了在yogyat \ = a的帮助下计算地解决它的方法。在这里，只考虑了两个主要学校，即纽约\ = aya和vy \ = akarana。我们的论文试图展示在该领域创建工具的含义。此外，我们的工具还涉及创建“Ontologal标签集”以及标记Lexicon的策略。本文还涵盖了消融的介绍性描述。这种策略和某些案例研究将形成纸质的核心。

translated by 谷歌翻译

Strategies of Effective Digitization of Commentaries and Sub-commentaries: Towards the Construction of Textual History

Diptesh Kanojia , Malhar Kulkarni , Sayali Ghodekar , Eivind Kahrs , Pushpak Bhattacharyya

分类：自然语言处理

2022-01-05

本文介绍了称为“文本历史工具”的数字工具的其他方面。我们描述了其各种突出特征，特别参考其特征，可能有助于理智学家在文本上数字化评论和子评论。该工具通过各种时间级捕获文本的历史演进，以及从各种类型的相关文本中剔除的相互关联数据。我们使用k \ = a \'sik \ = avrtti（kv）的文本作为示例文本，并且在照相专家的帮助下，我们将评论数字化为我们提供的评论。我们将NY \ = ASA（NY）数字化，Padama \〜njar \ = i（PM）和子注释称为Tantraprad \ = IPA（TP）和Makaranda（MK）。我们将每次评论和子评论划分为功能单位，并描述了功能单元划分背后的方法和动机。基于使用在工具中输入的数据的距离方法，我们的功能单元部门有助于为文本生成更准确的系统发育树。

translated by 谷歌翻译

Utilizing Wordnets for Cognate Detection among Indian Languages

Diptesh Kanojia , Kevin Patel , Pushpak Bhattacharyya , Malhar Kulkarni , Gholamreza Haffari

分类：自然语言处理

2021-12-30

自动同源检测（ACD）是一个具有挑战性的任务，用于帮助像机器翻译，信息检索和计算系统发育等这样的NLP应用。身份不明的同源对可能对这些应用构成挑战并导致性能的退化。在本文中，我们检测到Hindi的十个印度语言中的同源词对，并使用深度学习方法来预测单词对是否是同源的。我们将IndowordNet识别为基于基于正交相似性的方法和使用从其所获得的数据的基于正交相似性的方法和列车神经网络模型来检测同源字对的潜在资源。我们将平行的Corpora标识为另一个潜在资源，并对它们进行相同的实验。我们还通过进一步的实验验证Wordnets的贡献，并报告高达26％的提高性能。我们讨论了与密切相关的印度语言中的同源检测的细微差别，并将检测到的同源名单作为数据集发布。我们还观察到的行为，在某种程度上不相关的印度语文对，并在其中释放检测到的同源名单。

translated by 谷歌翻译

Challenge Dataset of Cognates and False Friend Pairs from Indian Languages

Diptesh Kanojia , Pushpak Bhattacharyya , Malhar Kulkarni , Gholamreza Haffari

分类：自然语言处理

2021-12-17

同源存在于不同语言的同一文本的多种变体中（例如，德语“Hund”和“猎犬”中的英语意味着“狗”）。它们对各种自然语言处理（NLP）应用构成了挑战，例如机器翻译，交叉语音歧义，计算系统发育和信息检索。解决这一挑战的可能解决方案是识别跨语言对的同源。在本文中，我们描述了为十二个印度语言的两种同源数据集，即梵语，印地文，issamese，奥里亚，kannada，古吉拉蒂，泰米尔，泰卢固，旁遮普，孟加拉，马拉萨和马拉雅拉姆。我们将同源数据从印度语态语言字典数字化，并利用链接的印度语言Wordnets来生成同源集。此外，我们使用Wordnet数据来创建一个False Friends'DataSet for Eleven Language对。我们还使用以前可用的基线同源检测方法评估我们数据集的功效。我们还借助词汇表进行了手动评估，并通过本文释放策划的金标准数据集。

translated by 谷歌翻译

Harnessing Cross-lingual Features to Improve Cognate Detection for Low-resource Languages

Diptesh Kanojia , Raj Dabre , Shubham Dewangan , Pushpak Bhattacharyya , Gholamreza Haffari , Malhar Kulkarni

分类：自然语言处理

2021-12-16

同源是不同语言的同一词汇形式的变体;例如，英语中的“Fonema”和英语中的“音素”是同源的，这两者都意味着'声音单位'。在任何两种语言中自动检测同源的任务可以帮助下游的NLP任务，例如交叉信息检索，计算系统发育和机器翻译。在本文中，我们展示了使用跨语言词嵌入来检测十四印度语言中的同源。我们的方法介绍了从知识图中使用上下文，以生成用于同源检测的改进的特征表示。然后，我们评估了我们对神经电机翻译（NMT）对神经电机翻译（NMT）的影响，作为下游任务。我们评估我们的方法，以检测十二个印度语言的具有挑战性的数据集的方法，即梵语，印地文，issamese，奥里亚，kannada，古吉拉蒂，泰米尔，Telugu，Punjabi，Bengali，Marathi和Malayalam。此外，我们为另外两种印度语言，Konkani和Nepali创建评估数据集。我们在F评分方面，观察到高达18％的分数，以获得同源检测。此外，我们观察到使用我们的方法提取的同源有助于提高NMT质量高达2.76 BLEU。我们还公开发布我们的代码，新建的数据集和交叉语言模型。

translated by 谷歌翻译

Cognition-aware Cognate Detection

Diptesh Kanojia , Prashant Sharma , Sayali Ghodekar , Pushpak Bhattacharyya , Gholamreza Haffari , Malhar Kulkarni

分类：自然语言处理 | 人工智能

2021-12-15

自动检测同源有助于机器翻译的下游NLP任务，交叉语言信息检索，计算系统发育和交叉命名实体识别。先前的同源检测任务方法使用正射，语音和语义相似度的特征集。在本文中，我们提出了一种富集特征集的新方法，从人类读者的凝视行为中提取了认知功能。我们收集凝视行为数据，了解一个同源的小样本，并表明提取的认知功能有助于证实检测的任务。但是，凝视数据收集和注释是一个昂贵的任务。我们使用收集的凝视行为数据来预测更大样本的认知功能，并显示预测的认知功能，也显着提高了任务性能。通过先前提出的方法，我们报告了收集的凝视特征的10％，12％使用预测的凝视特征。此外，我们与我们的代码和交叉语言模型一起释放收集的凝视行为数据。

translated by 谷歌翻译

Explainable Artificial Intelligence in Retinal Imaging for the detection of Systemic Diseases

Ayushi Raj Bhatt , Rajkumar Vaghashiya , Meghna Kulkarni , Dr Prakash Kamaraj

分类：计算机视觉 | 机器学习

2022-12-14

Explainable Artificial Intelligence (AI) in the form of an interpretable and semiautomatic approach to stage grading ocular pathologies such as Diabetic retinopathy, Hypertensive retinopathy, and other retinopathies on the backdrop of major systemic diseases. The experimental study aims to evaluate an explainable staged grading process without using deep Convolutional Neural Networks (CNNs) directly. Many current CNN-based deep neural networks used for diagnosing retinal disorders might have appreciable performance but fail to pinpoint the basis driving their decisions. To improve these decisions' transparency, we have proposed a clinician-in-the-loop assisted intelligent workflow that performs a retinal vascular assessment on the fundus images to derive quantifiable and descriptive parameters. The retinal vessel parameters meta-data serve as hyper-parameters for better interpretation and explainability of decisions. The semiautomatic methodology aims to have a federated approach to AI in healthcare applications with more inputs and interpretations from clinicians. The baseline process involved in the machine learning pipeline through image processing techniques for optic disc detection, vessel segmentation, and arteriole/venule identification.

translated by 谷歌翻译

Exploring the Limits of Differentially Private Deep Learning with Group-wise Clipping

Jiyan He , Xuechen Li , Da Yu , Huishuai Zhang , Janardhan Kulkarni , Yin Tat Lee , Arturs Backurs , Nenghai Yu , Jiang Bian

分类：机器学习 | (统计)机器学习

2022-12-03

Differentially private deep learning has recently witnessed advances in computational efficiency and privacy-utility trade-off. We explore whether further improvements along the two axes are possible and provide affirmative answers leveraging two instantiations of \emph{group-wise clipping}. To reduce the compute time overhead of private learning, we show that \emph{per-layer clipping}, where the gradient of each neural network layer is clipped separately, allows clipping to be performed in conjunction with backpropagation in differentially private optimization. This results in private learning that is as memory-efficient and almost as fast per training update as non-private learning for many workflows of interest. While per-layer clipping with constant thresholds tends to underperform standard flat clipping, per-layer clipping with adaptive thresholds matches or outperforms flat clipping under given training epoch constraints, hence attaining similar or better task performance within less wall time. To explore the limits of scaling (pretrained) models in differentially private deep learning, we privately fine-tune the 175 billion-parameter GPT-3. We bypass scaling challenges associated with clipping gradients that are distributed across multiple devices with \emph{per-device clipping} that clips the gradient of each model piece separately on its host device. Privately fine-tuning GPT-3 with per-device clipping achieves a task performance at $\epsilon=1$ better than what is attainable by non-privately fine-tuning the largest GPT-2 on a summarization task.

translated by 谷歌翻译

Utilizing Prior Solutions for Reward Shaping and Composition in Entropy-Regularized Reinforcement Learning

Jacob Adamczyk , Argenis Arriojas , Stas Tiomkin , Rahul V. Kulkarni

分类：机器学习

2022-12-02

In reinforcement learning (RL), the ability to utilize prior knowledge from previously solved tasks can allow agents to quickly solve new problems. In some cases, these new problems may be approximately solved by composing the solutions of previously solved primitive tasks (task composition). Otherwise, prior knowledge can be used to adjust the reward function for a new problem, in a way that leaves the optimal policy unchanged but enables quicker learning (reward shaping). In this work, we develop a general framework for reward shaping and task composition in entropy-regularized RL. To do so, we derive an exact relation connecting the optimal soft value functions for two entropy-regularized RL problems with different reward functions and dynamics. We show how the derived relation leads to a general result for reward shaping in entropy-regularized RL. We then generalize this approach to derive an exact relation connecting optimal value functions for the composition of multiple tasks in entropy-regularized RL. We validate these theoretical contributions with experiments showing that reward shaping and task composition lead to faster learning in various settings.

translated by 谷歌翻译

A Novel Statistical Independence Test for Dynamic Causal Discovery with Rare Events

Chih-Yuan Chiu , Kshitij Kulkarni , Shankar Sastry

分类： (统计)机器学习 | 机器学习

2022-11-29

Causal phenomena associated with rare events frequently occur across a wide range of engineering and mathematical problems, such as risk-sensitive safety analysis, accident analysis and prevention, and extreme value theory. However, current methods for causal discovery are often unable to uncover causal links between random variables that manifest only when the variables first experience low-probability realizations. To address this issue, we introduce a novel algorithm that performs statistical independence tests on data collected from time-invariant dynamical systems in which rare but consequential events occur. We seek to understand if the state of the dynamical system causally affects the likelihood of the rare event. In particular, we exploit the time-invariance of the underlying data to superimpose the occurrences of rare events, thus creating a new dataset, with rare events are better represented, on which conditional independence tests can be more efficiently performed. We provide non-asymptotic bounds for the consistency of our algorithm, and validate the performance of our algorithm across various simulated scenarios, with applications to traffic accidents.

translated by 谷歌翻译